This article originally appeared in The Bar Examiner print edition, Winter 2024-2025 (Vol. 93, No. 4), pp. 15–20.
In this issue’s “Seven Questions” column, we bring you an interview with National Council of State Boards of Nursing (NCSBN) CEO Philip Dickison, PhD, RN.
NCSBN, as part of its mission to empower and support nursing regulators in their mandate to protect the public, administers the National Council Licensure Examination (NCLEX®), a standardized test that every US state regulatory board uses to determine if a candidate is ready to become licensed as a nurse. Dr. Dickison, who holds a PhD in quantitative research, evaluation, and measurement, recently steered the launch of the Next Generation NCLEX (NGN), which debuted in April 2023.
Dr. Dickison became CEO of NCSBN in October 2023; prior to that he served as the organization’s Chief Operating Officer, a post he had held since 2017. He was NCSBN’s Chief Officer of Examinations for seven years before that. Before joining NCSBN, Dr. Dickison was Director of Health Professions Testing at Elsevier, Inc., and Associate Director at the National Registry of Emergency Medical Technicians.
Licensure exams for lawyers and nurses share common goals of protecting the public and ensuring that newly licensed practitioners enter their respective professions with the knowledge, skills, and abilities they need to perform their work. We asked Dr. Dickison about the impetus for revising the NCLEX and the process for determining how it should change, NCSBN’s process for testing new exam items and ensuring their validity and reliability, its communication with stakeholders about the upcoming changes, and the timeline for NGN development and changes from the previous exam.
The interview was conducted by Melissa K. Hansen, Executive Director of the Maine Board of Bar Examiners, Shela Shanks, Director of the Committee on Admissions for the District of Columbia Court of Appeals, and Claire J. Guback, NCBE Editorial Director and editor of The Bar Examiner. The interview has been edited for length and clarity.
1. Tell us about the National Council of State Boards of Nursing, the relationship between NCSBN and the jurisdictions it serves, and how the NCLEX fits into the licensing process for nurses.
NCSBN celebrated its 45th anniversary in 2023. Since its inception, NCSBN has developed exam materials for the jurisdictions it serves. This came in the form of an exam pool originally, from which each jurisdiction drew items to develop its own exam.
The NCLEX evolved from the foundation of ensuring public protection and facilitating portability of nurses from jurisdiction to jurisdiction—as well as alleviating some of the workload for the jurisdictions, which is one of NCSBN’s core responsibilities. The NCLEX started as a mass-administration, paper-based, two-day exam; with technological advances it evolved to a computerized adaptive testing (CAT) exam. CAT is a method that uses computer technology and measurement to increase the efficiency and accuracy of the exam process.
NCSBN’s membership is comprised of the nursing regulatory bodies (NRBs) in the 50 US states, the District of Columbia, and four US territories. There are 7 exam user members and 23 associate members that are either NRBs or empowered regulatory authorities from other countries or territories. Each of these jurisdictions has agreed to use the NCLEX as an element in its licensure decisions; this allows NCSBN to produce a single exam used across the United States and ultimately create a way to share licensure information across the jurisdictions. We originally divided the jurisdictions up into geographic areas, which helped facilitate the discussion of issues and allowed us to help resolve those issues through model regulations and ultimately to become a primary data repository for issues related to the workforce and discipline for all the jurisdictions. What started with an exam has grown into something much broader as we became involved in the regulatory space. And the jurisdictions’ agreement to use a single exam developed by an external organization to measure candidate readiness ensured credibility, validity, and reliability in their licensing decisions.
NCSBN’s influence extends beyond the United States. NCSBN supports nursing regulators across the world and facilitates regulatory solutions to borderless health care delivery. Its exams are internationally recognized as the leading nursing exams.
2. What was the impetus and overall goal for undertaking development of the NGN, and what was your process for determining how the NCLEX should change?
At NCSBN our philosophy is that we should, every five years or so, undertake a fundamental, comprehensive look at operations to evaluate whether they are still accomplishing organizational goals and doing so in the best way possible. We conduct a practice analysis every three years. There is generally some change in nursing practice over that period, but that has especially been the case in recent decades, so we also run an annual mini practice analysis to identify any trends that might require immediate attention.
More than a decade ago, the chair of the NCLEX Examination Committee asked the important question: Is the NCLEX measuring the right things? Although we were confident that we were effectively measuring entry-level competency, we committed to fully researching the answer and began a deep study of the exam. We took a fresh look at our practice analysis process to determine whether we were in fact measuring the right things to the fullest extent possible.
In my opinion, referring to previous practice analyses involves a certain bias toward their findings and methods, so I suggested starting from a blank slate. We hired 12 industrial-organizational scientists as consultants to assist us with a brand-new practice analysis grounded in fresh observations of the nursing profession.
We divided the scientists into teams of three or four and had them observe nurses for 24 hours in various settings—long-term care facilities, major medical centers, rural hospitals, and doctors’ offices—and had them write down everything they saw. This exercise took place over a period of three days to gather information about what nurses actually do, versus what nurses tell us they do or what we think they do.
We received about 2,500 pages of observations; we then did deep analysis to tie those observations to tasks. The primary finding that connected nurse characteristics, tasks, and outcomes was that approximately 65% of everything a nurse does requires communication, problem-solving, and clinical judgment.
With this valuable information, we returned to the scientists and NCSBN’s psychometricians with the goal of analyzing our current measurement model and determining whether we were measuring these skills at the right level. The conclusion: we were measuring these skills to the best of our ability given the current item types—primarily multiple-choice questions—but we were doing so in a limited fashion.
In my time as both a nurse and a psychometrician in various organizations, I’ve observed a general tendency to assume that if an item is purported to measure a certain skill, it does so. With my blank-slate approach, I aimed to flip that assumption and start with a construct from which we would then build items to measure those skills.
The next question was how to accomplish this. We started with a global literature review to define problem-solving and clinical judgment in the nursing setting. We discovered that in the current environment, only about 20% of employers felt that entry-level nurses had clinical judgment competency they believed nurses need on day one.
We also discovered that about 60% of discipline cases involving first-year nurses were related to problem-solving, clinical judgment, and communication. And of that 60%, about 50% related to emergency critical issues. This clearly pointed to the fact that although we were measuring these skills, we were not measuring them as well as we could be.
Having answered the Exam Committee’s question, the next step was to start from the construct and operational model we now had for clinical judgment and ask which of our current items actually measure that construct. With exception, very few did. We clearly needed to rethink our assessment model and create different item types.
To this end, we invited a group of about 15 skills-teaching experts to NCSBN headquarters in Chicago. The goal was to have them, over the course of a full day, brainstorm ideas for new items that measure the skills we sought to assess. By the end of that day, we had over 100 ideas. After consolidating overlap among many of them, we narrowed the selection down to about 20. Behind the scenes, we had also brought in about 10 technology experts, and gave them until the next morning to build prototypes of these items to share with the skills experts.
3. What was your process for testing the new items and ensuring their validity and reliability?
We began by pilot testing these 20 prototype items with real candidates. We enlisted 100 nursing students in their senior year of nursing school for talk-aloud studies, with eight psychometricians and measurement experts listening as the students talked through each item.
From there, we narrowed the 20 items down to 14 usable ones. We also concluded that single items didn’t accomplish what we needed; instead, items in the context of a practice scenario were more suited to the skills assessment we sought. To ascertain psychometric stability for such items, we built case studies and did research on independence of items. We confirmed that moving to a polytomous scoring model (in which candidates can earn partial credit for an answer) was in fact robust and yielded stable measurements. We determined that the exam could still have standalone items—given the range of skills and knowledge to be measured—but inserting items based on case studies was important to properly assess clinical judgment.
We then conducted a field test of the new items, with the goal of enlisting at least 20,000 to 25,000 test takers.1 Since our purpose was to test the items rather than the candidates’ knowledge, we broke the test up into smaller forms of 20 items. Within those 20 new items we embedded 2 or 3 items from the current test that the candidate hadn’t yet seen but for which we had stable estimates; having comparability data between the new items and the existing items gave us valuable information on validity and reliability for the new exam. Over a period of 18 months, we gave candidates the option to take this smaller test form and give us feedback. More than 800,000 NCLEX candidates participated over several years, far exceeding our goal.
To address any motivation bias (i.e., candidates who were attempting to access the new questions without trying to legitimately answer them), we built a very conservative caution index (whereby we calibrated the items using different methods for identifying only motivated students). Applying that index eliminated about half of candidate responses, but that still left us around 380,000 data points for the 20 prototype items—translating to millions of data points overall.
With those in hand, we were able to test our polytomous scoring model, including our independent questions, which proved to be stable to the extent that we saw a 25% reduction in our precision of error measurement—basically, we achieved more precision in making our decisions. Decision accuracy was improved.
4. What was your process for communicating with stakeholders about the upcoming changes?
Historically, there has always been a bit of a wall regarding the exam between NCSBN and both nursing educators and publishers, rooted in a perception that communication about the exam with either of these groups might compromise it—a perception I strongly disagreed with. I suggested, when we embarked on this journey over 10 years ago, that we begin to break down that wall. We actively expanded communication with educational associations, accreditors, publishers, and test prep companies. I and other NCSBN staff went to their meetings annually and kept them informed about all NGN developments. Our strong communications team was invaluable during this time.
We also created an online quarterly update, including what we called “NGN Talks,” and encouraged everyone to share it broadly to ensure all stakeholders had the most current, accurate information.
As we got closer to NGN launch, we had to consider the educational publishing cycle. Generally, nursing books are published on a five-year cycle; thus, at five years out, we knew we needed to involve educational publishing so that they could prepare for updating their books before NGN launch. NCSBN held an invitational two-day education summit for the major publishers with the goal of showing them our model and where the NCLEX was headed, thereby creating a space where they could ask questions as they went into their editorial cycle.
Finally, I decided at the three-year mark that we needed a big marketing campaign. A communications plan to market to legislators—different from the one to educators and candidates was built. A not-for-profit arm of a major broadcasting network was engaged, and we were able to benefit from their free services and outreach assistance, using NCSBN’s own marketing team to create video content.
The campaign was also highly successful at improving perceptions of the NGN. Using Nielsen ratings, we discovered that our biggest negativity rating among industries and stakeholders was with the educators, at a rating of 80%. Two years later, after that campaign, we had an 85% positivity rating from educators. We continued that campaign process for three years—well worth the effort, as we didn’t experience nearly the amount of pushback at NGN launch as expected.
5. How long had the previous NCLEX been used? What was the general timeline of the NGN development process from initial discussions to launch?
In 2010, plans for minor revisions to the NCLEX were already underway, with a goal to launch in 2015. Even with the proposed minor revisions, we knew a 2015 launch date was overly ambitious. Instead, I advocated for more time to do a comprehensive, thorough review of the exam. During a networking session with a colleague, I proposed my idea. I was told it would take 35 or even 40 years to accomplish what I’d proposed and that the measurement models didn’t even exist. I replied that I was determined to accomplish it in 10 years.
It was in fact almost a decade to the day from the Exam Committee’s initial question to the NGN launch. Building the in-house expertise to make this happen, rather than relying on outside sources, was a crucial part of that success—expertise we’ve maintained to this day to support other projects.
6. To what extent does the NGN differ from the previous exam, and how did you prepare candidates for the changes?
The NGN consists of five new item types that measure nursing clinical judgment:
- Extended multiple response, which allows candidates to select one or more answer options. This item type is similar to the previous NCLEX multiple-response items but with more options, and it uses partial-credit scoring.
- Extended drag and drop, which allows candidates to move or place response options into answer spaces. This item type is like the previous NCLEX ordered-response items, but not all options may be required to answer the item, and some items may have more response options than answer spaces.
- Cloze, which allows candidates to select one option from a drop-down list. An item can include more than one drop-down list, and the lists can be used as words or phrases within a sentence, and within tables and charts.
- Enhanced hot spot, which allows candidates to select their answer by highlighting predefined words or phrases; they can select or deselect the highlighted parts by clicking on the words or phrases. This item type allows candidates to read a portion of a client medical record and select the words or phrases that answer the item.
- Matrix/grid, which allows candidates to select one or more answer options for each row and/or column. This item type is useful in measuring multiple aspects of the clinical scenario with a single item.
In terms of preparing candidates, there was fundamentally no change in how they interacted with the exam, as we were already using CAT, but it was important that they see the new case-study format. We took three sets of registered-nurse case studies and three sets of practical-nurse case studies and turned them into a sample pack, which we made publicly available. We then built a tutorial that allowed potential candidates to interact with the items as they exist on the screen.
7. What is your typical standard-setting process for the exam, and did the NGN involve any changes to scoring or standard setting?
Every three years we go through a standard-setting process on the NCLEX to reevaluate the standard by which minimal competence is determined. This involves a survey of educators and their thoughts and beliefs about that standard—whether the NCLEX is measuring the correct things and whether the difficulty level is appropriately set. We ask the same questions of employers and their experience with the nurses who work for them.
Over a three-day period, we brought in a diverse group of 100 nurses (based on characteristics such as nurse workforce distribution, age, diversity, occupational role, and location of work), including two newly licensed nurses who have passed the NCLEX in the previous six months, to take a mock maximum-length exam (165 items). The items in the mock exam were selected to adhere to the test plan as well as the current passing standard and difficulty level. The panelists then used a modified Angoff approach to provide ratings on the difficulty of items. The panelists’ judgments were then summarized and analyzed by the psychometric team to arrive at the recommended passing standard, which was ultimately taken forward to the board of directors for formal approval and adoption.
Some of those items are live items, so we know how candidates respond to them. The subject matter experts do not know this, however, so we ask how many of those 100 candidates would answer this item correctly (not necessarily how many should answer it correctly but how many are likely to answer it correctly). The first expert might identify this as a difficult question that maybe only 20% of candidates would answer correctly. Another expert might say 90%. Of the highest and lowest ratings of the 12 people participating, we then ask those two experts to argue their point.
We follow this process through 10 items and then have the experts review those items again, to decide whether to maintain or change their evaluation, and come to an overall rating. We use those findings to say that the group rating on this item is, for instance, 55 with a certain error of measurement. We do that for every item on the exam.
The findings are converted to a logit (a unit of measurement to report relative differences between candidate ability estimates and item difficulties), which is presented to the board of directors on behalf of the Exam Committee along with their standard error of measurement, with a recommendation of the proposed passing score. The results typically follow a pattern seen over the years: educators generally think the test is too hard, and employers think the test is too easy. The standard-setting group always falls somewhere in the middle, either skewed toward the employer or skewed toward the educator.
It is then up to the board of directors, armed with the data I have collected and based on their expertise, to decide the passing score. My responsibility is to present the research, and I can explain the spread and margin of error, but it is ultimately not my role to weigh in with a recommendation. However, once the board reaches a decision, they must support that decision—ideally based on both their expertise and the data. I often know, based on my own experience and our probability model, what the impact of that recommended passing score will be. This allows me to prepare for public reaction, but I am armed with defendable reasons for why the board chose that specific passing score based on our extensive data.
For the NGN, because the scoring moved to a polytomous model and we now measured case studies in addition to individual items, we had to alter our standard setting methods and underlying scoring model to account for this. A single-parameter (item difficulty) Rasch model for dichotomous items, often considered the simplest yet most widely used in high-stakes assessment, was already being utilized for our scoring purposes. We examined and rigorously tested multiple models, whether by altering existing ones or developing our own. It was determined the best path forward was a commonly used extension of the Rasch dichotomous model known as the Rasch partial credit model. This model still only uses the item difficulty as the single parameter but is flexible and perfect for tests containing single-point (dichotomous) and multipoint (polytomous) items.
An important finding is that we could keep the same scale we’d had in place for 25 years, meaning we have forward and backward comparability.
I believe our experience at NCSBN has significance as a model for measurement in other high-stakes professions. The NGN launched on April 1, 2023, and has proven to be tremendously stable. As of March 31, 2024, we have had more than 400,000 candidates in the United States and Canada take the test, and it has held up exactly as we predicted.
Note
- Editor’s note: For context, there are approximately 5.7 million registered and practical nurses and approximately 1.3 million licensed attorneys in the United States. In 2024, 313,988 examinees sat for the NCLEX-RN and NCLEX-PN, combined. See https://www.ncsbn.org/nursing-regulation/national-nursing-database/licensure-statistics.page; https://www.americanbar.org/news/profile-legal-profession/demographics/; and https://www.ncsbn.org/public-files/NCLEX_Stats_2024_Q3_Passrates.pdf. Also in 2024, 49,748 examinees sat for the bar exam in a US jurisdiction administering one or more NCBE bar exam components. See https://www.ncbex.org/statistics-research/bar-exam-results-jurisdiction. NCBE estimates that by July 2026 it will have pre-tested NextGen bar exam content with over 48,000 recent bar exam takers and 3L/4L law students. (Go back)
Contact us to request a pdf file of the original article as it appeared in the print edition.